Content Modeling Paradigm: an interplay of relationship between Author, Document, Topic, and Words

نویسندگان

  • Deepak Gupta
  • T. F. Lunt
چکیده

for any work of literature, a fundamental issue is to identify the individual(s) who wrote it, and conversely, to identify all of the works that belong to a given individual or to identify the individual who writes many papers on same topic or to identify the topics name that an author works on. Information extraction techniques (such as Author Name and Topic Recognition) have long been used to extract useful pieces of information from text. The types of information to be extracted are generally fixed and well defined. However in some cases, the user goal is more abstract and information types cannot be narrowly defined. For example, a reader of online user reviews typically has the goal of making a good choice and is interested to learn about the different aspects of a topic and author relation (e.g., famous author of a topic, author’s papers with his research field). Some of these aspects may be known by the reader and some others may need to be discovered from the inherent text structure in a large collection. Even for the known aspects (such as “author name” and “topic”), the challenge is to recognize various hidden aspects like number of papers written by an author, his research field, popularity of an author.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Examining the Relationship between “Science” and “Religion” in Socio-Cultural Context of the Renaissance: A Kuhnian Reading of Bacon’s New Atlantis

Thomas Kuhn’s model of paradigm shift as an intra-systemic framework to account for changes within the scientific discourse has been adopted by scholars in different fields as diverse as sociology, theology, economy, and education, to name only a few. The present study argues that the same model can usefully be drawn upon to examine the relationship between ‘science’ and ‘religion’ with some re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010